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Memories relating to a painful, negative event are adaptive and can be stored for a lifetime to support preemptive avoid- 
ance, escape, or attack behavior. However, under unfavorable circumstances such memories can become overwhelmingly 
powerful. They may trigger excessively negative psychological states and uncontrollable avoidance of locations, objects, or 
social interactions. It is therefore obvious that any process to counteract such effects will be of value. In this context, we 
stress from a basic-research perspective that painful, negative events are "Janus-faced" in the sense that there are actually 
two aspects about them that are worth remembering: What made them happen and what made them cease. We review pub- 
lished findings from fruit flies, rats, and man showing that both aspects, respectively related to the onset and the offset of 
the negative event, induce distinct and oppositely valenced memories: Stimuli experienced before an electric shock acquire 
negative valence as they signal upcoming punishment, whereas stimuli experienced after an electric shock acquire positive 
valence because of their association with the relieving cessation of pain. We discuss how memories for such punishment- and 
relief-learning are organized, how this organization fits into the threat-imminence model of defensive behavior, and what 
perspectives these considerations offer for applied psychology in the context of trauma, panic, and nonsuicidal self-injury. 



The acknowledged "negative" mnemonic effects of adverse ex- 
periences mostly relate to what happens before the onset of 
an aversive, painful event. However, there is a less widely ac- 
knowledged type of memory that relates to what happens after 
the offset of or after escape from such a painful event, at the 
moment of "relief" (Fig. 1) (we use "relief" to refer specifically 
to the acute effects of punishment offset; an equally legitimate 
yet broader use of the word in, e.g., "fear relief," encompasses 
any process that eases fear [Riebe et al. 2012]). Indeed, in exper- 
imental settings, it turns out that stimuli experienced before 
and during a punishing episode are later avoided as they signal 
upcoming punishment, whereas stimuli experienced after a 
painful episode can subsequently prompt approach behavior, ar- 
guably (Box 1) because of their association with the relieving 
cessation of pain (Konorski 1948; Smith and Buchanan 1954; 
Wolpe and Lazarus 1966; Zanna et al. 1970; Solomon and 
Corbit 1974; Schull 1979; Solomon 1980; Wagner 1981; 
Walasek et al. 1995; Tanimoto et al. 2004; Yarali et al. 2008, 
2009b; Andreatta et al. 2010, 2012; Yarali and Gerber 2010; 
Ilango et al. 2012; Navratilova et al. 2012; Diegelmann et al. 
2013b); for a corresponding finding in the appetitive domain, 
see Hellstern et al. (1998) and Felsenberg et al. (2013). Such re- 
lief can both support the learning of the cues associated with 
the disappearance of the threat and reinforce those behaviors 
that helped to escape it. Obviously, the positive conditioned 
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valence of and ensuing learned approach behavior toward 
such cues would decrease the probability of encountering the 
threat again and/or keep exposure time to a minimum. We review 
the literature of what is known, and discuss what should be asked, 
about the mechanisms of such punishment- and relief-learning in 
the fruit fly Drosophila, as well as in rat and in man, as in these 
three species fairly concordant approaches have been taken. 
This is timely, because despite the rich literature on punish- 
ment-learning (e.g., for reviews regarding Drosophila, see: 
Dubnau and Tully 2001; Heisenberg 2003; Gerber et al. 2004; 
Davis 2005; Keene and Waddell 2007; Kahsai and Zars 2011; re- 
garding Aplysia, see: Lechner and Byrne 1998; Baxter and Byrne 
2006; Benjamin et al. 2008; regarding rodents, see: Davis et al. 
1993; Fendt and Fanselow 1999; LeDoux 2000; Maren 2001; 
Christian and Thompson 2003; Fanselow and Poulos 2005; Pape 
and Pare 2010; regarding monkeys, see: Davis et al. 2008; regard- 
ing humans, see: Rosen and Schulkin 1998; Ohman and Mineka 
2001; Delgado et al. 2006; Ohman 2008; Davis et al. 2009; Milad 
and Quirk 2012), little is known about the neurobiological mech- 
anisms or the psychological corollaries of relief-learning. Such 
knowledge would be important also from an applied perspective: 
The more distinct the underlying processes of punishment- and 
relief-learning are, the more likely they contribute independently 
to pathology, and the easier it will be to selectively interfere with 
either of them. 



C 2014 Gerber et al. This article, published in Learning & Memory, is available 
under a Creative Commons License (Attribution-NonCommercial 4.0 Interna- 
tional), as described at http://creativecommons.Org/licenses/by-nc/4.0/. 
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Figure 1. Event-timing and valence. For the "Good" and the "Bad" 
things in life, two aspects matter in particular: What made them 
happen? What made them cease? The diagram illustrates that the 
On-set of something good (top left; e.g., finding food, a salary increase) 
can act as a reward, while the On-set of something bad (bottom left; 
e.g., being stung by a bee, being sent to prison) can act as punishment. 
In turn, however, the Off-set of pain upon cooling the sting or release from 
prison entails an oppositely valenced aspect, relief (bottom right); likewise, 
having your lollipop pilfered as a kid or experiencing a salary cut entail 
negatively valenced frustration (top right). This review is chiefly concerned 
with the mnemonic consequences of punishment (red color code 
throughout) and relief (green color code throughout). We consistently 
plot those behavioral measures toward the top of the y-axes that indicate 
positive valence, and consistently plot measures indicating negative 
valence toward the bottom of the y-axes. Please note that despite detailed 
coverage of the good and the bad, the ugly is ignored throughout (Leone 
1966). 



Fly 

Punishment- and reward-learning 

When flies receive an odor followed by an electric shock, they sub- 
sequently avoid this odor because it predicts shock (Quinn et al. 
1974; Tully and Quinn 1985). Specifically, a two-group reciprocal- 
training paradigm is used (Fig. 3): One group of flies receives odor 
A followed by electric shock (denoted as "-"), whereas odor B is 
presented alone (A-/B). The second group of flies receives re- 
ciprocal training (A/B-). Then, both groups are tested in a 
forced-choice situation for their relative preferences between A 
and B. It turns out that punishment of A tips the balance between 
A and B in favor of B, while punishment of B biases choice in favor 
of A. An associative learning index (abbreviated as LI in Figs. 3, 5 
[below]) is then calculated on the basis of this difference in pref- 
erence between the two reciprocally trained groups. In addition 
to such punishment-learning, an appetitive version of the para- 
digm is available which uses sugar as reward (Tempel et al. 
1983): When flies receive an odor together with a sugar reward, 
they subsequently show an increase in preference for this odor 
because it predicts reward. Thus, the paradigm is "bivalent" in 
the sense that it can reveal both decreases in odor preference after 
punishment-learning, and increases in odor preference after 
reward-learning. This bivalent nature of the task is essential for 
the ensuing discussion. 

The cellular and molecular networks underlying short-term 
memory after punishment-learning have been studied in some de- 
tail (Fig. 4; regarding longer-term forms of memory and mecha- 
nisms of memory consolidation, see the recent studies by, e.g., 
Placais et al. 2012 or Perisse et al. 2013 and references therein). 
Briefly, upon presentation of an odor, a particular combination 
of olfactory sensory neurons on the antennae and maxillary palps 
is activated according to the ligand profiles of the respectively ex- 



pressed receptor proteins. As a rule, all sensory neurons expressing 
the same receptor protein then converge at a single glomerulus 
in the antennal lobe, where they provide output to mostly uni- 
glomerular projection neurons. The combination of projection 
neurons activated by a given odor is shaped, in addition, by lateral 
connections among the antennal lobe glomeruli. In the next step, 
projection neurons connect to both mushroom body Kenyon cells 
and lateral horn neurons as the third-order processing stages (see 
Laurent et al. 2001 for a discussion of temporal-coding aspects in 
olfaction). The electric shock, in turn, triggers a reinforcement sig- 
nal, likely in a subset of dopaminergic neurons which carry this 
signal to many if not all Kenyon cells (Schwaerzel et al. 2003; 
Riemensperger et al. 2005; Kim et al. 2007; Claridge-Chang et al. 
2009; Mao and Davis 2009; Aso et al. 2010, 2012; Pech et al. 
2013; for larval Drosophila, see Schroll et al. 2006; Selcho et al. 
2009). In contrast, as mentioned above, only a subset of the 
Kenyon cells is activated by the odor. It is only in these particular 
Kenyon cells, due to the coincidence of the odor-induced activity 
and the shock-induced reinforcement signal, that an odor-shock 
short-term memory trace is formed. Such a memory trace conceiv- 
ably consists in a modulation of connection between the Kenyon 
cells and their output neurons (Sejourne et al. 2011), with the 
AC-cAMP-PKA signaling cascade as one of the necessary processes 
involved in molecular coincidence detection (Zars et al. 2000; 
Thum et al. 2007; Blum et al. 2009; Tomchik and Davis 2009; for 
a similar conclusion for appetitive learning, see Gervasi et al. 
2010; for discussions of purely physiology-based conclusions 
about memory trace localization, see Heisenberg and Gerber 
2008; for a general critique of the concept of memory trace locali- 
zation, see Menzel 2013). If, after training, the learned odor is per- 
ceived, activity in the mushroom body output neurons — by virtue 
of their modified input from the Kenyon cells — is altered such that 
conditioned olfactory avoidance can take place (Sejourne et al. 
2011). Interestingly, in accord with prediction-error signaling 
(Rescorla 1988), punishment-trained odors not only enable condi- 
tioned behavior but also apparently induce feedback onto dopami- 
nergic neurons (Riemensperger et al. 2005; see the seminal study of 
Hammer 1993 for a corresponding finding in honeybee appetitive 
learning). Other, nontrained odors support conditioned avoid- 
ance only to the extent that they are similar in quality (Niewalda 
et al. 201 1; Campbell et al. 2013; Barth et al. 2014) and/or intensity 
(Yarali et al. 2009a) to the actually trained odor. Appetitive learn- 
ing, using sugar as reward is, in principle, organized in a similar 
way (e.g., Schwaerzel et al. 2003; Keene et al. 2006; Trannoy et al. 

2011) , with at least two significant differences: 

• It has been argued that, in addition to the Kenyon cells, there 
may be an odor-reward short-term memory trace in the projec- 
tion neurons (Drosophila, Thum et al. 2007; honeybee, Menzel 
2001; Giurfa and Sandoz 2012; Menzel 2012; but see Peele 
et al. 2006). However, in Drosophila at least, independent confir- 
mation of this is so far lacking (see discussions in Heisenberg 
and Gerber 2008; Michels et al. 2011). 

• The reinforcing effect of reward involves aminergic neurons 
distinct from those mediating punishment (Schwaerzel et al. 
2003; Burke et al. 2012; Liu et al. 2012), such that distinct sets 
of Kenyon cell output synapses are likely to be modulated to 
accommodate conditioned approach. 

The net effect of driving or blocking signaling through a set 
of dopaminergic neurons covered by the TH-Gal4 expression pat- 
tern is to induce or impair, respectively, aversive memories 
(e.g., Schwaerzel et al. 2003; Schroll et al. 2006; Aso et al. 2010, 

2012) . The corresponding manipulations using the TDC-Gal4 ex- 
pression pattern to drive octopaminergic neurons, or the tfSH Mls 
mutant (CG1543) to impair octopamine biosynthesis, have 
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BOX 1. Relief-learning — safety-learning 



We use the term relief-learning specifically to imply the learning of an 
association between a stimulus and the offset of punishment. It was 
Konorski (1 948) who initially proposed that the occurrence of a stimu- 
lus during the falling phase of a punishment signal could result in a 
positively valenced memory for this stimulus. In other words, in addi- 
tion to their punishing effect, punishments may induce a delayed 
state of relief supporting positively valenced memories. 

An alternative view is that repeated exposure to the punishment 
within the experimental context establishes this context as a danger- 
ous one. Within such a dangerous context, a stimulus that is present- 
ed in an explicitly unpaired manner with the punishment could come 
to signal a subsequent period of safety. In other words, the actual 
absence of a contextually predicted punishment during and after the 
stimulus can lead to a positive prediction error ("better-than expect- 
ed"), supporting a positively valenced memory for it (Rescorla 1988 
and references therein). 

Safety-learning, but not relief-learning, would thus be possible 
with unpaired or temporally randomized presentations of stimulus 
and punishment (for review, see Kong et al. 201 3). As in such safety- 
learning procedures, the stimulus becomes a conditioned inhibitor 
predicting the nonoccurrence of the punishment and can reduce con- 
textual fear, increase the exploration of unprotected environments, 
act as a positive secondary reinforcer, reduce immobility in the forced 
swim test, ameliorate the consequences of chronic mild stress, and 
promote neurogenesis (Rogan et al. 2005; Pollak et al. 2008; 
Christianson et al. 2012). Notably, after safety-learning of an auditory 
stimulus, less-than baseline evoked auditory potentials are induced in 
the lateral amygdala but increased evoked potentials in the caudopu- 
tamen (Rogan et al. 2005). At the same time, there is an activation of 
inhibitory pathways inside the amygdala that is different from those 
activations underlying fear extinction (Amano et al. 2010). No amyg- 
daloid effects are found for relief-learning, however (see main text) 
(Andreatta et al. 2012). We finally note that neurons in the basolateral 
amygdala that show increased firing to a safety-learned stimulus 
appear to partially overlap with those activated by reward-related 
stimuli (Sangha et al. 201 3). 

Returning to procedures that consistently present the stimulus 
shortly after punishment, the issue is that under these conditions the- 
oretically relief- as well as safety-learning could occur (see Fig. 2). 
How can one, in these cases, experimentally discriminate between 
relief- and safety-based explanations (see also discussions in Wagner 
and Larew 1 985; Malaka 1 999)? 

• In a repetitive training design, safety-learning should monotonically 
decrease as the inter-stimulus interval (ISI) lengthens, because the 
stimulus moves closer in time toward the next punishment, short- 
ening the following safety period. In most preparations looked at, 
the ISI-dependency appears bell-shaped (Plotkin and Oakley 1975; 
Maier et al. 1976; Tanimoto et al. 2004; for related findings, see 
Hellstern et al. 1998; Franklin et al. 2013a; for a counter-example, 
see Moscovitch and LoLordo 1968), fitting more naturally to a 
relief-based explanation, as the relief signal peaks shortly after pun- 
ishment offset. However, the differences in the shape of the 
ISI-functions as proposed by relief- and safety-based models are 
subtle (Malaka 1999 loc. cit. Figs. 4b,5) and difficult to ascertain 
experimentally, in particular when the learning scores are low. 

• Safety-learning arguably requires multiple training trials, because 
only when the context is sufficiently "charged" by previous trials 
will there be a positive prediction error during the subsequent 



trials, and the stimulus can become a safety-signal. While a single 
presentation of a stimulus with punishment offset can result in a 
positively valenced memory (e.g., Wagner and Larew 1985; for 
related findings, see Hellstern et al. 1998; Franklin et al. 2013a), in 
most cases multiple pairings are required (Heth 1976 and referenc- 
es therein; for Drosophila, see Yarali et al. 2008). Note that, unless 
one assumes implausibly rapid context-learning, the sufficiency of a 
single training trial argues against a safety-based explanation. A re- 
quirement for multiple trials, on the other hand, is consistent with 
both relief- and safety-based explanations (for detailed discussions, 
see Heth 1976; Wagner and Larew 1985; Malaka 1999). 
• As noted, safety-learning should rely on the value of the experimen- 
tal context as a signal for the punishment. In rats, Chang et al. 
(2003) argued for such context-dependency and thus for a safety- 
based explanation: An extinction treatment for the context-punish- 
ment association diminished the effect of prior punishment-stimu- 
lus training. However, following a very similar reasoning, Yarali 
et al. (2008) tested in a Drosophila paradigm whether the initial 
punishment-stimulus pairings of a multiple-trial training session 
can be substituted by mere exposure to punishment within the ex- 
perimental context. This was not found to be the case, offering no 
support for a safety-based explanation. Neither argument is particu- 
larly strong, though, because Chang et al. (2003) did not actually 
demonstrate extinction of the context-punishment association, 
and because in flies direct evidence for contextual learning under 
the employed conditions is lacking. We note that in honeybee 
learning using odors and a sugar reward, changing the context 
during the inter-stimulus interval leaves scores unaffected, also 
failing to provide evidence for context-mediation (Hellstern et al. 
1998). 

Obviously, these experimental strategies have given mixed or 
tentative results across studies, paradigms, and species, making firm 
and general conclusions premature. By our assessment, however, the 
relief-based explanation seems to be in the lead at this point — when it 
comes to grasping the mnemonic processes related to punishment 
offset. 

We note that despite the above-sketched conceptual dichotomy, 
animals may well have the capacity for both relief- and safety- 
learning, so the parametric boundary conditions for their respective 
operation and their underlying mechanisms would need to be clari- 
fied. Indeed, one can imagine experiments that would parametrically 
turn experimental procedures ideal for relief-learning (pairing of the 
stimulus with punishment offset) into paradigms optimal for safety- 
learning (unpaired presentations of punishment and stimulus). 
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corresponding net effects in the appetitive domain (e.g., 
Schwaerzel et al. 2003; Schroll et al. 2006; Burke et al. 2012). 
There is, however, clearly no simple dichotomy in reinforce- 
ment processing, dopamine for punishment and octopamine for 
reward: A subset of dopaminergic neurons from the so-called 



PAM-cluster mediates appetitive reinforcement (Burke et al. 
2012; Liu et al. 2012), and genetic distortions of the dopamine re- 
ceptor gene DoplRl (CG9652, also known as dumb) fittingly com- 
promise both appetitive and aversive learning (Kim et al. 2007; 
Burke et al. 2012; Qin et al. 2012). Current work is beginning to 
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disentangle the specific roles for subsets of aminergic neurons, 
their target Kenyon cells, and the respectively expressed amine re- 
ceptors in the acquisition, consolidation, or retrieval of various 
forms of olfactory memory as well as in motivational control of 
behavior (e.g., Krashes et al. 2009; Berry et al. 2012; Placais et al. 
2012; Perisse et al. 2013; see also Burke et al. 2012); for these as- 
pects too, the roles of dopamine and octopamine apparently 
transgress the above-mentioned dichotomy. Thus, the striking 
valence-specificity of driving or impairing aminergic signaling 
seems to be a property of particular neurons and of their specific 
target receptors and cellular connectivity, rather than a property 
of transmitter systems as whole. A similar picture may be emerg- 
ing for mammals as well (Brischoux et al. 2009; Bromberg-Martin 
et al. 2010; Ilango et al. 2012; Lammel et al. 2012). Strikingly, even 
within the aversive domain there may be dissociations on the lev- 
el of aminergic neurons: Activations caused by aversive air puff or 
bad tastants in individual midbrain dopamine neurons in the 
monkey were either not observed at all (most midbrain dopamine 
neurons are instead activated strongly by unpredicted rewards and 
reward-predicting stimuli [Schultz 2013]) or were seen only for 
aversive air puff, or only for bad taste — but not both (Fiorillo 
et al. 2013b, loc. cit. Fig. 9A) (about a 
third of the neurons, however, were in- 
hibited by both). This could provide the 
basis for memories that are specific for ** 
the kind, rather than the mere aversive- 
ness, of the punishment employed; in- 
deed, distinct dopamine neurons were 
activated by cues associated with air puff 
versus cues associated with bad taste (Fio- 
rillo et al. 2013b, loc. cit. Fig. 9B) (again, 
about a third of the neurons were inhibit- 
ed by both). In flies it has so far not been 
systematically tested whether memories 
refer to the specific quality of the rein- 
forcer (but see Eschbach et al. 2011); if 
they did, this would call for a fundamen- " 
tal revision of current thinking on how 
memories are organized in the Drosophila 
brain. 



Relief-learning 

Compared to punishment- and reward- 
learning, relief-learning in Drosophila 
(Figs. 3, 5) is much less well understood. 
It was first described by Tanimoto et 
al. (2004): If an odor is presented after 
shock, flies subsequently approach that 
odor, arguably (Box 1) because of its asso- 
ciation with the relieving cessation of 
shock (Figs. 3, 5). That is, a fairly minor 
parametric change such as inverting 
event timing during training has a rather 
drastic qualitative effect: It inverts behav- 
ioral valence, such that odor ->■ shock 
training establishes conditioned avoid- 
ance, while shock -* odor training estab- 
lishes conditioned approach toward the 
odor. 

In a follow-up study, Yarali et al. 
(2008) demonstrated that after relief- 
training, the preference toward the 
trained odor is indeed associatively in- 
creased; this effect does not seem to inter- 
act with the innate valence of the odor, 



but rather adds on to it. That study also suggested that 
relief-learning is likely independent of context-shock associa- 
tions and that it needs more repetitive but less intense training 
than punishment-learning (Fig. 5; see Box 1 for the theoretical 
implications of these findings). Namely, relief-learning in Droso- 
phila reaches asymptote after six training trials (together with 
the given signal-to-noise ratio, this makes relief-learning 10- to 
20-fold more laborious than the standard one-trial version of 
punishment-learning; note that in all direct comparisons re- 
viewed in this paper nothing but the inter-stimulus interval is var- 
ied between punishment- and relief-learning procedures) and is 
optimal at relatively mild shock intensities. The latter may be 
because shocks that are too strong induce anterograde amnesic 
effects, such that a subsequent odor presentation remains unrec- 
ognized by the flies, preventing the relatively weak effects of 
relief-learning. Alternatively, it may be that the more intense 
the shock, the longer is the aversive state it induces, such that 
the ensuing relief is further delayed. In that case, the optimal tim- 
ing of the odor with respect to shock for yielding either type of 
memory would depend on shock intensity (Bayer et al. 2007). 
Both of these scenarios may offer perspectives on the nature of 
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"trauma" (Box 2) in the sense that massive or mild adverse expe- 
riences may induce punishment and relief memories to a lesser 
or stronger extent. 

The temporal pattern of decay of relief-memory differs from 
both punishment- and reward-memories (Fig. 5; Yarali et al. 
2008; Diegelmann et al. 2013b): Over the first 4 h following train- 
ing, relief-memory decays much more slowly than punishment- 
memory, as is reminiscent of the slow initial decay rate of reward- 
memory (Tempel et al. 1983; Schwaerzel et al. 2007; Krashes and 
Waddell 2008). For longer retention periods, however, only multi- 
ple, temporally spaced training trials result in longer-term (24-h) 
punishment-memory (Tully et al. 1994; Isabel et al. 2004; Diegel- 
mann et al. 2013b), whereas for longer-term reward-memory 
even a single training trial suffices (Krashes and Waddell 2008; 
Colomb et al. 2009). For relief-learning, despite using multiple, 
spaced training trials, Diegelmann et al. (2013b) found no longer- 
term relief-memory (Fig. 5). Also, unlike both punishment- and 
reward-learning, relief-learning fails to induce amnesia-resistant 



Figure 3. Punishment- and relief-learning in Drosophila. (A) Image of a female adult fruit fly 
Drosophila melanogaster (from Demerec and Kaufmann 1973). A typical female fly is ~3-4 mm in 
length; males are ~0.5 mm smaller. (8) Histological preparation of an adult Drosophila brain. Shown 
is a frontal section stained using the reduced silver technique. Cell nuclei are visible as small purple 
dots; nerve tracts can be discerned as purple fibers. Blue and pale-blue stains indicate regions of synap- 
tic contact (neuropil). The mushroom body calyces can be distinguished as large, round, paired 
pale-blue structures toward the top of the preparation (from www.flybrain.org); please note that the 
laterally situated visual brain areas (retina, lamina, medulla, lobula), which comprise almost half of 
the fly's brain, are cropped from the image. A typical fly brain comprises ~ 100,000 neurons and is 
~450 in width. (C) Wheel apparatus for Drosophila relief- and punishment-learning, partially disas- 
sembled for clarity. For training, a cohort of 60-80 flies is loaded to a training tube lined inside with an 
electrifiable copper grid (brown tube at top of device); to the left of the training tube, a black odor con- 
tainer can be discerned. These odor containers can be changed such that either odor A is presented with 
or another odor B without electric shock. After training, flies are transferred to a neighboring position on 
the wheel. In that position, visible in the lower part of the apparatus, two testing tubes are attached, 
each linked with an odor container, such that flies face a choice between the two odors. Air is sucked 
out of the apparatus by an exhaust pump, meaning that airflows from the outside via the odor contain- 
ers and tubes with the flies inside into the exhaust. The floor plate of the apparatus is ~30 x 30 cm in 
size. When fully assembled, it allows the training and testing of four cohorts of flies simultaneously. (D) 
Sketch of the sequence of events for relief- and punishment-learning, using a between-group design. 
For both groups of flies, one odor (gray cloud) is presented temporally so far removed from the electric 
shock (typically 3-4 min) that no association takes place. For those flies that undergo relief-learning, a 
second odor (white cloud) is presented only after an electric shock, at the moment of "relief" 
(relief-learning) (please note that for subgroups the chemical identity of the involved odors is 
swapped). In contrast, for those flies that undergo punishment-learning this sequence is reversed, 
such that the second odor is presented before the shock. These respective cycles of training are repeated 
six times; then, flies are given the choice between the two odors in a final test. From this choice behavior 
a learning index (LI) (-100; 100) is calculated such that positive scores imply conditioned approach to 
the second odor, while negative values imply conditioned avoidance of it (for details, see Yarali et al. 
2008; Diegelmann et al. 201 3b). Note that when very long intervals between the second odor 
(white cloud) and the shock are used, essentially both odors are presented in an explicitly unpaired 
way. This could either entail no learning at all, about either odor; or it could establish both odors as 
safety-signals (see Box 1). In either case, flies would distribute equally between the two odors in the 
final choice test (and this is indeed observed: see F). In other words, the employed discriminative train- 
ing-binary choice test paradigm "purifies" scores for punishment- and relief-learning, yet factors out 
safety-learning effects. (£) Experimental data showing punishment- or relief-memory, dependent on 
the inter-stimulus interval (ISI) during training (the I SI is defined as the time interval from shock 
onset to odor onset). The ISI is calculated such that a negative ISI implies odor^ shock training, 
while a positive ISI implies shocks odor training. The box plots show that for an ISI of -15 sec, 
strong punishment-learning is observed in terms of conditioned avoidance of the odor (negative Lis), 
while for an ISI of 40 sec, relief-learning is observed in terms of conditioned approach toward the 
odor, which is notably weaker. Box plots show the median as the middle line, and the 25%/75% 
and 10%/90% quantiles as box boundaries and whiskers, respectively. Coloring implies 
Bonferroni-corrected significance from chance, i.e., from zero. Sample sizes are N= 32 and 35 for 
punishment- and relief-memory, respectively. (F) Event-timing and conditioned valence. Test behavior 
is plotted across the ISI (-150, -45, -15, 0, 20, 40, 70, 200 sec). For clarity, only the median learning 
indices (LI) are displayed. As odor presentation during training is shifted in time past the shock episode, 
conditioned behavior changes valence from "Bad" to "Good": It turns from conditioned avoidance to 
conditioned approach. Coloring implies Bonferroni-corrected significance from chance, i.e., from zero. 
Sample sizes are N = 8, 24, 32, 47, 24, 35, 12, 12. Data in E,F taken from Yarali et al. (2009b). Gray 
circles represent data from Tanimoto et al. (2004), Yarali et al. (2008), or Diegelmann et al. (201 3b); 
the five gray triangles represent medians of datasets from unpublished experiments using the very 
same methods as the aforementioned papers. 



memory (Fig. 5; Diegelmann et al. 2013b). If both punishment- 
and relief-memories were formed in a natural string of events 
around a painful experience, these findings may be of practical im- 
portance: While trying to erase punishment-memory, one may 
unwittingly also erase relief-memory. Depending on the relative 
strength of these memories and the relative effectiveness of the 
treatment, the net effect of such manipulation may be to make 
the overall mnemonic effect of the experience even more adverse. 
Again, these conclusions may be of relevance for the understand- 
ing of "trauma" and its treatment (Box 2). 

In two further studies, the neurogenetic bases of fly re- 
lief-learning were investigated: First, the role of the white gene 
(CG2759) in relief-learning was tested (Yarali et al. 2009b). The 
white gene product forms one-half of an ATP-binding transmem- 
brane transporter (O'Hare et al. 1984). Its reported cargoes are 
the signaling molecule cGMP, as well as a heterogeneous set of 
molecules required for the synthesis of eye pigments and seroto- 
nin (Sullivan and Sullivan 1975; Howells et al. 1977; Sullivan 
et al. 1979, 1980; Koshimura et al. 2000; 
Evans et al. 2008). The white 1 " 8 muta- 
tion enhances punishment-learning (Di- 
egelmann et al. 2006; Yarali et al. 2009b) 
and reduces relief-learning (Yarali et al. 
2009b), while leaving unaltered the re- 
flexive avoidance behavior toward the 
shock. It thus seems that specifically the 
mnemonic effect of shock, the "take- 
home message" of the painful event, is 



more negative for the white mutant. 
Whether this genetic effect in the 
white 1118 mutant is molecularly related 
to altered levels of biogenic amines is a 
matter of controversy (Sitaraman et al. 
2008; Yarali et al. 2009b). Interestingly, 
genetic variance in the human homolog 
of the white gene (ABCG 1 ) has been relat- 
ed to susceptibility to mood and panic 
disorders (Nakamura et al. 1999). 

Second, Yarali and Gerber (2010) 
compared relief-learning to both punish- 
ment- and reward-learning in terms of 
sensitivity to manipulations of aminergic 
signaling. Blocking synaptic output from 
a subset of dopaminergic neurons de- 
fined by the TH-Gal4 driver partially 
impairs punishment-learning (see also 
Schwaerzel et al. 2003; Aso et al. 2010, 
2012); however, relief-learning remains 
intact. As for the comparison to reward- 
learning, interfering with another set of 
dopaminergic neurons, which need to 
provide output for reward-learning to oc- 
cur (defined by the DDC-Gal4 driver) 
(Burke et al. 2012; Liu et al. 2012), also 
leaves relief-learning intact. Further- 
more, the t/3h Mls mutation, compromis- 
ing octopamine biosynthesis, impairs 
reward-learning (see also Schwaerzel 
et al. 2003), but not relief-learning. It 
therefore appears that relief-learning is 
neurogenetically distinguishable from 
both punishment- and reward-learning. 
Importantly, however, roles of dopamine 
or octopamine signaling in relief-learn- 
ing cannot be ruled out because in none 
of the experiments described was the 
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Figure 4. Simplified working model of punishment-learning in Drosophila. (A) Timing of events for punishment-learning, and indication of the time 
points at which snapshots of activity patterns are displayed in 61-64. (6) Snapshots of stimulus-evoked activity during training (67-63) and testing 
(64). Coloring indicates activity. Please note that both the odors as well as the electric shock activate two pathways each: one to trigger the respective 
innate behavior and one detour toward the mushroom body Kenyon cells. It is these detour pathways that feature coincidence detection and associative 
plasticity. Also, please note the combined divergence-convergence connectivity between the projection neurons and mushroom body Kenyon cells. 
(OSN) Olfactory sensory neurons, (AL) antennal lobe, (PN) projection neurons, (KC) mushroom body Kenyon cells, (DA) a subset of dopaminergic 
neurons mediating an internal punishment signal. (61,62) Depending on the ligand profiles of the expressed olfactory receptors and the connectivity 
within the circuit, a given odor (gray cloud in 87) activates a particular set of olfactory sensory neurons, antennal lobe glomeruli, projection neurons, 
and mushroom body Kenyon cells. A different odor (white cloud in 62) activates a different pattern of the respective neurons. As for both these odors 
the animals are experimentally naive, only innate olfactory behavior is expressed; that is, the output of the mushroom body Kenyon cells (open 
orange triangles) is not sufficiently strong to activate their postsynaptic partners and thus cannot steer conditioned escape. (83,84) Coincidence of a dop- 
aminergic punishment signal and odor-induced activity in the odor-specific subset of mushroom body Kenyon cells (83) (as flies are confined to the train- 
ing tube, and no behavioral observations are possible [Fig. 3], the integration of innate shock-related escape with innate olfactory behavioral tendencies 
remains unknown). Coincidence is molecularly detected by the type I adenylate cyclase and arguably enacted, at least in part, by phosphorylation of 
synapsin and the recruitment of synaptic vesicles for subsequent release. If following this coincidence the previously punished odor is presented again 
(84), it can activate a set of thus potentiated output synapses from the mushroom body Kenyon cells (filled triangles), such that conditioned escape 
is possible. Please note that the interplay of innate and conditioned olfactory behavior (83) remains distressingly little understood. For clarity, the 
circuit is numerically simplified and altogether omits a number of neuronal classes including within-antennal lobe interneurons, multiglomerular projec- 
tion neurons, and a host of modulatory inputs as well as of mushroom body intrinsic feedback neurons (for references to more detailed accounts, see main 
text; consult Laurent et al. 2001 for possible temporal-coding aspects of olfaction). 



respective type of aminergic signaling completely shut off, due to 
methodological constraints (for a detailed discussion, see Yarali 
and Gerber 2010). Also, in the tests for octopamine signaling, 
relief-learning scores in the genetic controls were low, making it 
practically impossible to ascertain decreases in relief-learning 
scores; as tendentially increased relief-learning scores may be rec- 
ognized in the data provided by Yarali and Gerber (2010), however, 
it still seems fair not to assume a strict requirement for octopamine 
in relief-learning. 

We would like to note that despite the above-mentioned dif- 
ferences across punishment-, reward-, and relief-learning, these 
three processes are certainly not completely distinct; indeed, 



much of the sensory and motor circuitry as well as the molecular 
mechanisms will be shared. Thus, the research problem is rather 
to understand exactly which processes are shared and which are 
specific across these three tasks — and how the fly brain is orga- 
nized to accommodate them. 

Possible mechanisms underlying relief-learning 

The relative timing of events is also a key factor in synaptic plastic- 
ity. Typically, in spike-timing dependent plasticity (STDP) (for re- 
views, see Bi and Poo 2001; Caporale and Dan 2008; Markram 
et al. 2011) a synaptic connection is strengthened when action 
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Figure 5. Features of relief-learning in wild-type Drosophila. (A) Relief-learning requires multiple trials. 
Coloring implies Bonferroni-corrected significance from chance, i.e., from zero. Sample size from left to 
right: 1 6, 1 5, 20, 1 9, 23. Data taken from Yarali et al. (2008). (fi) Relief-learning is strongest at interme- 
diate shock intensities. Coloring implies Bonferroni-corrected significance from chance, i.e., from zero. 
Sample size from left to right: 8, 7, 12, 15, 7. Data taken from Yarali et al. (2008). (C) Time course of 
memory decay differs between relief-learning and punishment-learning. Relief-memory scores are 
stable for 75 min, yet have decayed fully by 24 h after training. In contrast, punishment-memory 
scores decay to about half within the first 75 min and then remain stable. Coloring implies 
Bonferroni-corrected significance from chance, i.e., from zero. Sample size N= 51, 35, 46, 43, 40 
for relief-memory and N = 20 in all cases of punishment-memory. Data taken from Diegelmann 
et al. (201 3b). (D) Cold-amnesia abolishes relief-memory, but spares about half of punishment-memory 
scores. Coloring implies Bonferroni-corrected significance from chance, i.e., from zero. Sample size N = 
1 4 in all cases. Data taken from Diegelmann et al. (201 3b). 



at the time scale of milliseconds, can in- 
deed lead to behavioral effects on the 
time scale of seconds. The authors mod- 
eled a circuit in which the odor activates 
a subset of Kenyon cells, whereas the 
shock excites their postsynaptic partner, 
which mediates avoidance. For both 
odor- and shock-induced activity, rather 
high firing rates were assumed that decay 
only slowly (several seconds) upon termi- 
nation of the respective stimulus. Then, 
the authors implemented an STDP rule, 
operating at the millisecond-scale (in- 
deed, STDP takes place at the Kenyon 
cell output synapses, as shown in the 
locust [Cassenaer and Laurent 2007, 
2012]). As long as relatively high spiking 
rates and relatively slow decay rates are 
assumed, this model does account for 
the effect of the relative timing of odor 
and shock at the behavioral level, which 
occurs at the scale of several seconds. 
However, the assumed strong and per- 
sistent spiking activity has not been 
observed in Drosophila Kenyon cells; 
rather, it turns out that these cells fire 
strikingly few spikes at the beginning 
and/or at the end of odor presentation 
(Murthy et al. 2008; Turner et al. 2008; 
for moth Kenyon cells, see Ito et al. 
2008). Also, it remains unclear whether 
the shock signal indeed impinges upon 
the postsynaptic partners of the Kenyon 
cells. The data so far rather suggest 
that the Kenyon cells themselves receive 
a dopaminergic reinforcement signal: 
Dopamine receptors are enriched in the 
Kenyon cells (Han et al. 1996; Kim et al. 
2003), and restoring receptor function 
in the Kenyon cells rescues the learning 
impairment of a dopamine receptor 
mutant (Kim et al. 2007). Furthermore, 
synaptic output from Kenyon cells dur- 
ing the training phase has repeatedly 
been found to be dispensable for pun- 
ishment-learning (Dubnau et al. 2001; 
McGuire et al. 2001; Schwaerzel et al. 
2003). Altogether, Drew and Abbott's 
(2006) STDP-based model thus does not 
embrace the empirical findings particu- 
larly well. We note, however, that the 
STDP rule is sensitive to neuromodula- 
tory effects, as shown for the locust 
Kenyon cell output synapses (Cassenaer 
and Laurent 2012), and incorporating 
such effects may result in more realistic 
STDP-based models of punishment-, re- 
ward-, and relief-learning. In any event, 



potentials in the presynaptic cell precede those in the postsynap- 
tic cell, while a reverse order of events weakens the synaptic 
connection. 

Given the conspicuous parallelism of STDP on the synaptic 
level with the timing-dependent switch between punishment 
and relief-learning in Drosophila, Drew and Abbott (2006) ven- 
tured a computational study to establish whether STDP, operating 



the STDP-based model by Drew and 
Abbott (2006) does not predict safety-learning as a result of un- 
paired presentations of odor and shock. Such unpaired-train- 
ing can, however, have mnemonic consequences: In larval 
Drosophila unpaired presentations of odor and a sugar reward 
turn the odor into a predictor of no-reward (Saumweber et al. 
2011; Barth et al. 2014; concerning honeybees, see Hellstern 
et al. 1998 and references therein). Also, given the innate 
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BOX 2. Punishment, relief, and "trauma" 



Remembering past adverse, punishing events is, in principle, adaptive 
since it helps us to avoid or to cope with future dangerous situations. 
However, emotional memories of extremely distressing "traumatic" 
events can become overwhelming, leading to psychiatric complica- 
tions such as post-traumatic stress disorder (PTSD). Core symptoms of 
PTSD are intrusions and flashbacks, i.e., unusually vivid memories of 
the traumatic event. Such a traumatic event can be criminal victimiza- 
tions, accidents, combat experiences, or childhood maltreatment 
(summarized in Nemeroff et al. 2006). While the frequency, type, and 
intensity of such episodes are critical determinants for developing 
PTSD, it remains striking that <30% of those exposed to comparable 
sequences of events go on to develop PTSD. There are several person- 
related risk factors such as polymorphisms in several genes (e.g., DRD2 
or MAO), female gender, and preexisting psychiatric conditions, such 
as depression and alcohol abuse, as well as ineffective coping strate- 
gies. To develop a conceptual handle on PTSD, therefore, not only do 
the status of the subject and aspects of punishment-learning have to 
be considered (e.g., genotype and personal history of the subject, 
graveness of the traumatic experiences, levels of generalization, the 
temporal dynamics of memory consolidation, retrieval-induced recon- 
solidation, extinction, and spontaneous recovery), but mechanisms of 
operant learning as well as nonassociative processes are also likely to be 
of significance (for discussions, see Siegmund and Wotjak 2006; Riebe 
et al. 201 2). 

Such complexity makes it difficult to establish comprehensive 
experimental models of PTSD. Also, as the graveness of the events is 
of significance for the switch from adaptive aversive learning to 
trauma, it is intrinsically problematic to develop such experimental 
models because this graveness defines the boundaries of what is ethi- 
cally acceptable in experimental settings, not only in humans but in 



animals as well. In other words, if the experimental treatment is grave 
enough to induce PTSD, it may arguably not be ethical, whereas 
when it remains within ethical bounds, it may not be grave enough 
to induce PTSD. The existing rodent models of PTSD employ a single 
exposure to an intensive foot shock or to a predator to model the 
traumatic experience (e.g., Adamec and Shallow 1993; Siegmund 
and Wotjak 2007) and are useful for observing and understanding 
the long-lasting associative and nonassociative symptoms of fear 
(e.g., conditioned fear of contextual features of the traumatic experi- 
ence, social anxiety, neophobia, exaggerated startle). Such models 
should allow investigation into whether such experiences also lead 
to conditioned relief and whether such conditioned relief impacts 
"rodent-PTSD." 

Regarding PTSD in humans, we find it reasonable to suppose 
that, though perhaps restricted to an implicit level, there is a moment 
of relief upon the cessation of a traumatic event and that this may be of 
mnemonic, psychological, and behavioral significance to the trauma- 
tized person. In particular, to the extent that repetitions are "needed" 
to induce trauma, increasing or broadening of generalization of relief- 
type memories may be relevant entry points to ameliorate the impact 
of a first relatively mild such episode and/or to protect the patient in a 
possible next such episode. Furthermore, one may ask whether the ces- 
sation of intrusive flashback memories may cause relief, whether such 
relief contributes to the maintenance of the disorder or can be exploit- 
ed in therapy, or whether it may rather complicate therapy if such 
relief-learning repeatedly happens in therapeutic settings so as to un- 
wittingly induce attachment to these settings. Basic and translational 
research on punishment- and relief-learning with relatively mild aver- 
sive events may therefore also turn out useful with regard to trauma 
and PTSD. 



avoidance of odors typically observed in the current paradigm, the 
attraction to odors after unpaired presentations of odor and shock 
in adult Drosophila can be understood as reflecting safety-learning 
(e.g., Niewalda et al. 2011, loc. cit. Fig. S5; Barth et al. 2014). 

Alternatively, the event timing-dependent transition be- 
tween punishment and relief-learning may be rooted in the very 
molecular mechanism of coincidence detection. It is likely that 
during punishment-training the type I adenylate cyclase rutabaga 
(CG9533) acts as a molecular detector of the coincidence between 
a neuromodulatory reinforcement signal and the odor-evoked ac- 
tivity in the mushroom body Kenyon cells (Tomchik and Davis 
2009; Gervasi et al. 2010). Due to this coincidence, in the respec- 
tive odor-activated Kenyon cells, cAMP would be produced and 
PKA be activated, leading to the phosphorylation of downstream 
effectors, conceivably including Synapsin (CG3985). Synapsin 
phosphorylation would, in turn, allow the recruitment of reserve- 
pool vesicles toward the readily releasable pool (for discussion, see 
Diegelmann et al. 2013a), enabling a stronger synaptic output 
upon odor presentation after training and eventually leading to 
conditioned avoidance. Consistent with this scenario, the im- 
pairments in punishment-learning by mutations in the rutabaga 
and synapsin genes are not additive (Knapek et al. 2010); also, in 
odor-sugar associative learning of larval Drosophila, the impair- 
ment of the syn 97 mutant cannot be rescued by a transgenically ex- 
pressed synapsin that lacks functional PKA/CaMKII recognition 
sites (Michels et al. 2011; for recent data suggesting a role of 
CaMKII for Synapsin phosphorylation in olfactory habituation, 
see Sadanandappa et al. 2013). 

Could both punishment and relief-learning possibly be ac- 
commodated in the same Kenyon cells, based on an event timing- 
dependent, bidirectional modulation of AC-cAMP-PKA signaling? 
This was explored in a computational study by Yarali et al. (2012). 
In their model, upon the application of shock-alone, all Kenyon 



cells receive a shock-induced neuromodulatory signal, which trig- 
gers G-protein signaling, consequently activating the adenylate 
cyclase. As active adenylate cyclase accumulates, the reverse reac- 
tion, that is, deactivation of the adenylate cyclase, also takes 
place. Once shock is over and the neuromodulatory signal has 
waned, this deactivation of the adenylate cyclase becomes the 
dominant reaction, leading eventually to the deactivation of all 
adenylate cyclase molecules. The level of cAMP production 
caused by such shock-alone stimulation is taken as a "baseline" 
level and assumed to already potentiate the output from all 
Kenyon cells to the downstream avoidance circuit. Application 
of an odor, in turn, raises the presynaptic Ca 2+ concentration spe- 
cifically in the respective subset of model Kenyon cells coding for 
this odor. Such presynaptic Ca 2+ multiplicatively increases the 
rate constants for both the G-protein-dependent activation of ad- 
enylate cyclase and its deactivation (Yovell and Abrams 1992; 
Abrams et al. 1998). Thus, the net effect of Ca 2+ on the adenylate 
cyclase-dynamics depends on its timing. That is, Ca 2+ has no net 
effect if it arrives long before the neuromodulatory activation 
of adenylate cyclase has begun, or long after the deactivation 
of adenylate cyclase has been completed. Note that this model 
thus also does not predict safety-learning resulting from unpaired 
presentations of odor and shock, although such learning likely 
does take place. On the other hand, if the Ca 2+ arrives while ade- 
nylate cyclase-activation is dominant, it speeds up this activation, 
whereas if it arrives while adenylate cyclase is predominantly be- 
ing deactivated, this deactivation is accelerated. Accordingly, in 
simulated punishment-training the shock-induced neuromodu- 
latory signal activates the adenylate cyclase in all Kenyon cells. 
Only in those Kenyon cells that code for the particular odor 
does a rise in Ca 2+ coincide with this activation phase, speeding 
it up. The resulting above-baseline level of cAMP is then assumed 
to potentiate the output from these Kenyon cells further, beyond 
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the potentiation in all Kenyon cells due to the shock-alone, thus 
enabling the trained odor subsequently to induce conditioned 
avoidance more readily than other odors. In turn, in simulated 
relief-training, the odor-induced rise in Ca 2+ follows the shock- 
induced neuromodulatory signal that coincides with the deacti- 
vation phase of the adenylate cyclase. Therefore, in the particular 
odor-coding Kenyon cells, the adenylate cyclase is deactivated 
faster, resulting in cAMP production below the baseline level. 
Consequently, the output from these Kenyon cells is less potenti- 
ated compared to that of all other Kenyon cells (or compared to 
a situation where this odor is presented long before or long after 
the shock), rendering the trained odor less likely to induce con- 
ditioned avoidance than other odors. In a choice situation this 
would result in relative approach toward the trained odor. 

Thus, with this model, the timing-dependence of associative 
function on the behavioral level can be simulated, using Ca 2+ as 
a stand-in for odor, and neuromodulator as a stand-in for shock. 
The bidirectional regulation of the very coincidence detector, 
the adenylate cyclase, is used to explain the effect of event-timing 
on learning. This now invites experimental scrutiny, especially 
with respect to the requirement for the AC-cAMP-PKA cascade 
and the Kenyon cells for relief-learning 
as well as the identity of the respective 
neuromodulatory signal. 

In any event, both Drew and 
Abbott's (2006) STDP-based model and 
the adenylate cyclase-based model by 
Yarali et al. (2012) follow the "canonical" 
view of short-term mnemonic odor cod- 
ing in the mushroom body, holding 
that odors are coded combinatorially 
across the full array of -y-lobe Kenyon 
cells (regarding longer-term memory, 
this has recently been challenged by 
Perisse et al. 2013, suggesting that the 
a/@ set of Kenyon cells responsible 
for 3-h memory may be internally "mul- 
tiplexed" by valence and/or mem- 
ory strength). That is, punishment and 
relief-learning rely on the same olfactory 
representation such that both kinds of 
learning modify the same Kenyon cell 
output synapses onto the same down- 
stream circuit (depicted in Fig. 4 for 
punishment-learning) — but in oppo- 
site ways. While this scenario readily 
accounts for the observed opposite con- 
ditioned behaviors of avoidance and 
approach, more fine-grained investiga- 
tions into conditioned behavior, and 
into the anatomy of the mushroom 
body lobes, may show up the shortcom- 
ings of such scenarios: Punishment- 
learning may modulate kinds of behavior 
that relief-learning leaves unaffected and 
vice versa. 



Rat 

Punishment-learning 

Learning that a cue predicts an electric 
shock is one of the best-studied cases 
of Pavlovian conditioning in the rat 
(e.g., for reviews, see Fendt and Fanselow 
1999; LeDoux 2000; Maren 2001; Davis 



2006; Ehrlich et al. 2009; Pape and Pare 2010). The range of con- 
ditioned behaviors toward the learned cue can be understood as 
making the animal ready for the predicted aversive event and in- 
cludes protective and defense-related behaviors such as orienting, 
avoidance, freezing, and potentiation of reflexes including the 
startle response to facilitate fight-or-flight behavior, as well as au- 
tonomic changes such as tachycardia, hypertonia, and an activa- 
tion of the hypothalamic-pituitary-adrenal axis. Because this 
syndrome of conditioned effects is similar to the signs of fear in 
humans, this paradigm is typically referred to as fear con- 
ditioning. However, given that for flies there are no arguments 
for such a comprehensive similarity to human fear, and given 
that for the present cross-species review we want to use a terminol- 
ogy that is consistent across species, we will use the term punish- 
ment-learning instead of fear conditioning for the remainder of 
this contribution. 

In punishment-learning procedures for the rat, the cues to 
be learned can be visual, as already mentioned, or olfactory, tac- 
tile, auditory, or comprising contextual combinations of stimuli 
from multiple modalities, provided the respective sensory systems 
are mature (Hunt et al. 1994; Richardson et al. 1995, 2000). Rats 
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can learn the association between cue and punishment, which is 
typically delivered as a foot shock via a metal floor grid, after just 
one pairing (e.g., Sacchetti et al. 1999); however, more pairings 
lead to stronger and more stable memories. Further variables 
which modulate the strength and speed of learning are the 
intensity of the cue as well as of the shock, and in particular 
the timing of cue and shock relative to each other: According 
to the predictive character of the process, learning is not best 
when shock is applied with the beginning of the cue but, rather, 
when the shock is applied upon the end of the cue ("delay" pro- 
cedure). When a temporal gap is left between cue offset and 
shock onset ("trace" procedure), rats learn less well as the dura- 
tion of the gap increases. This timing-dependency not only 
matches that discussed above for the fly, but it is indeed one of 
the rather few universally observed features of associative learn- 
ing, on the respectively adaptive time scales ranging from hun- 
dreds of milliseconds in the case of eye-blink conditioning to 
hours in the case of flavor-aversion learning (cf. Rescorla 1988, 
loc. cit. Fig. 1). 

There are different ways to test behaviorally whether the rats 
have established the association between cue and shock. Here, we 
focus on the modulation of startle amplitude as a read-out. The 
startle response (Figs. 6, 8) can be elicited by sudden and intense 
stimuli (historically, the sound of pistol shots has been used in hu- 
man subjects [Strauss 1929]) and consists of short-latent muscle 
contractions collectively serving to protect the organism (espe- 
cially eyes and neck) and preparing it for subsequent fight-or- 
flight. It is an evolutionarily well-conserved and much-studied 
reflex (Koch 1999), such that the neural mechanisms of the startle 



Figure 6. Punishment- and relief-learning in the rat. (A) Image of an adult rat (adapted from Koch 
1999 with permission from Elsevier © 1999). (B) Histological preparation of a rat brain. Depicted is a 
Nissl-stained frontal section at the level of the amygdala and dorsal hippocampus. Intensely stained 
regions are somata-rich, lighter stain indicates fiber tracts. A typical rat brain consists of ~1 00 million 
neurons and is ~2.5 cm in width. (C) Apparatus for measuring the modulation of the startle response 
by relief- and punishment-learning in the rat. The animal is placed in a small enclosure (9-cm diameter, 
16-cm length). During training, light stimuli can be presented; electric shocks are administered via a 
floor grid. During the test a speaker can be used to deliver a loud noise that makes the animal 
startle, either in the presence or in the absence of the trained light stimulus. The amplitude of the 
startle response is measured by motion-sensitive transducers mounted underneath the floor grid. (D) 
Sketch of the sequence of events for relief- or punishment-learning, using a between-group design. 
For the group undergoing relief-learning, the light stimulus follows the shock, while for punishment- 
learning this sequence is reversed, such that the light stimulus precedes the shock. In the test the am- 
plitude of the startle response is measured, either in the presence or in the absence of the light stimulus. 
Relative to the startle amplitude upon the loud noise alone, the startle amplitude is attenuated after 
relief-learning (indicating positive conditioned valence), while after punishment-learning the startle 
amplitude is potentiated (indicating negative conditioned valence). Images of startled rats modified 
from Koch (1999) (adapted with permission from Elsevier © 1999). (£) Experimental data showing 
relief- or punishment-memory, depending on the inter-stimulus interval during training (the inter- 
stimulus interval [ISI] is defined as the time interval from shock onset to light onset). In order to 
display "Good" (positive conditioned valence) toward the top, startle attenuation is plotted toward 
the top of the y-axis; in turn, and in deliberate breach of the convention in the field, startle potentiation 
is plotted toward the bottom of the y-axis in order to display "Bad" (negative conditioned valence) 
toward the bottom. In line with convention, the sign of the startle modulation is presented as, respec- 
tively, negative or positive, because the actual behavior of the rats consists of less or more startle, respec- 
tively. Please note that the ISI is defined such that a negative ISI implies light -» shock training, while a 
positive ISI implies shock -» light training. The box plots to the left show that for an ISI of -4.5 sec, 
punishment-learning is observed in terms of potentiated startle (red), while for an ISI of 2.5 sec, 
relief-learning is observed in terms of attenuated startle (green). Notably, the two types of startle mod- 
ulation do not appear as drastically different in strength as in flies (Fig. 3). Box plots show the median as 
the bold middle line, and the 25%/75% and 1 0%/90% quantiles as box boundaries and whiskers, re- 
spectively. Sample size is N = 1 6 per group. (F) Event-timing and conditioned valence. Test behavior is 
plotted across the ISI. For clarity, only the median scores of startle modulation are displayed, derived 
from five experimental groups. As the light presentation is shifted in time past the shock episode 
during training, conditioned valence changes from "Bad" to "Good": it turns from startle potentiation 
to startle attenuation. Coloring implies Bonferroni-corrected significance from chance, i.e., from zero. 
Sample sizes are N = 1 2-16 per group. The stippled line shows the behavior of two control groups that 
had received either only the cue but no electric shocks at all, or cue and electric shocks at randomized 
ISIs. In these control groups a slight decrease in startle is observed, relative to the startle-alone testing 
condition (see text for discussion). Data in E,F taken from Andreatta et al. (2012). 



response itself and of its modulations are known in quite some de- 
tail in rodents (Fendt and Fanselow 1999; Davis 2006; Davis et al. 
2009), monkeys (Davis et al. 2008), and man (Davis et al. 2008, 
2009, van Well et al. 2012). The startle response is experimentally 
elicited by a sudden, very loud noise from a loudspeaker and can 
be measured by motion-sensitive transducers underneath the 
floor grid of the measurement apparatus (Fig. 6). This startle re- 
sponse, despite being phylogenetically ancient, is certainly not 
completely hard-wired: Although the motor programs are qualita- 
tively rather stereotyped, the amplitude of the startle response is 
higher when the animal is afraid. That is, after animals (or hu- 
mans) are trained such that they associate a cue with an electric 
shock, the amplitude of the startle response is increased in the 
presence as compared to the absence of the cue, an effect called 
fear-potentiated startle (Brown et al. 1951; Lipp et al. 1994; 
Koch 1999; Grillon 2002; Davis 2006). It is particularly important 
in the current context that the startle response has a nonzero 
baseline; that is, both increases and decreases in startle amplitude 
can be measured! Indeed, after associating a cue with a reward 
such as food or rewarding brain stimulation, the amplitude of 
the startle response is decreased in the presence of the cue (Schmid 
et al. 1995; Koch et al. 1996; Steidl et al. 2001; Schneider and 
Spanagel 2008). Thus, as in the case already discussed for flies, 
the paradigm of startle modulation is bivalent: Under neutral con- 
ditions startle is normal, while when expecting something bad 
startle is potentiated, and when expecting something good startle 
is attenuated. 

Concerning the circuits underlying punishment-learning in 
rats (Fig. 7), it is well established that the amygdala plays a 
critical role (summarized in Maren 
2001, Ehrlich et al. 2009, and Pape and 
Pare 2010). The lateral/basolateral part 
of the amygdala features as a conver- 
gence site for cue and shock processing. 
Information about the cue is carried by 
projections from the thalamic geniculate 
nucleus and the sensory cortices, while 
information about the shock is carried 
by projections from posterior intralami- 
nar nuclei of the thalamus and caudal in- 
sular cortex. During acquisition of the 
cue -shock association, long-term poten- 
tiation of the synapses from the cortical 
sensory and thalamic geniculate projec- 
tions to projection neurons within the 
lateral/basolateral amygdala is induced, 
thus effectively potentiating the cue's 
sensory pathway to the lateral/basolat- 
eral amygdala (McKernan and Shinnick- 
Gallagher 1997; Rogan et al. 1997). 
Furthermore, it is believed that long- 
term depression occurs in sensory path- 
ways mediating cues to the lateral/baso- 
lateral amygdala whose activity is un- 
or anti-correlated with shock occurrence 
(Heinbockel and Pape 2000). Thus, when 
after training the learned cue is presented 
alone, projection neurons of the lateral/ 
basolateral amygdala will be activated. 
These neurons can, in turn, activate pro- 
jection neurons in the central nucleus of 
the amygdala. The conditioned behavior 
is organized via projections from these 
central amygdala neurons toward the 
midbrain and the brainstem (Fendt and 
Fanselow 1999; LeDoux 2000; Maren 
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Figure 7. Simplified working model of relief- and punishment-learning in the rat. (A) During 
relief-learning, the shock is presented first and the light is presented only afterward. This, we propose, 
leads to memory trace formation by the coincidence of light processing and internal reinforcement pro- 
cessing in the nucleus accumbens (NAC) of the striatum where some neurons are active upon shock 
offset. Upon testing with the light stimulus, output from the nucleus accumbens is suggested to 
impinge upon the pontine reticular nucleus (PnC) in the brainstem to mediate startle attenuation. (6) 
During punishment-learning, the light is presented first and the shock is presented only afterward. 
This is known to lead to memory trace formation by the coincidence of light processing and internal re- 
inforcement processing in the lateral amygdala (LA). Upon testing with the light stimulus, output from 
the lateral amygdala, via the central amygdala (CA), also projects to the pontine reticular nucleus 
(PnC) in the brainstem, but by a pathway that leads to startle potentiation. (MG) Medial geniculate 
nucleus, (PIN) posterior intralaminar nuclei. (C) Local transient inactivation, using the CABA-A receptor 
agonist muscimol, of either the lateral amygdala (LA) or the nucleus accumbens (NAC) during the test for 
conditioned punishment or conditioned relief. Open plots refer to controls injected with saline. 
Punishment-learning leads to negative conditioned valence and thus is plotted downward (startle poten- 
tiation); relief-learning leads to positive conditioned valence and thus is plotted upward (startle attenu- 
ation). Inactivation of the lateral amygdala abolished punishment-memory but leaves relief-memory 
intact; in turn, inactivation of the nucleus accumbens leaves punishment-memory intact yet impairs 
relief-memory. Sample sizes are N = 7-9 per group. Data in C taken from Andreatta et al. (201 2). 



Key observations to support this 
working hypothesis come from pharma- 
cological inactivation or lesions of the 
lateral amygdala in rats, which robustly 
block punishment-learning (Hitchcock 
and Davis 1986; Helmstetter and Bellgo- 
wan 1994; Muller et al. 1997). Impor- 
tantly, the functional integrity of the 
lateral amygdala is necessary for both 
establishing and remembering cue- 
shock associations (Muller et al. 1997). 
It has further been revealed that long- 
term potentiation of the thalamic/ 
cortical input to the lateral/basolater- 
al amygdala underlying punishment- 
learning is NMDA receptor-dependent 
(Miserendino et al. 1990; Maren et al. 
1996) and is controlled by a complex 
network of GABA-ergic interneurons 
(summarized in Ehrlich et al. 2009). 
Activity of these interneurons can be 
modulated by several neuropeptides as 
well as by serotonin, noradrenalin, and 
acetylcholine. 

As briefly mentioned above, the am- 
plitude of startle is not only potentiated 
by learned punishment but can also be 
attenuated by cues associated with re- 
ward. Such conditioned "pleasure-atten- 
uated startle" was first established by 
Schmid et al. (1995): In their study, a 
light cue was repeatedly paired with a 
food reward. After this association was 
learned, startle amplitude was attenuated 
by the light cue. This effect can be 
blocked by lesions of the dopamine neu- 
rons within the nucleus accumbens but 
not by lesions of the amygdala (Koch 
et al. 1996). This first study on the neural 
basis of pleasure-attenuated startle sug- 
gested that the mesoaccumbal system 
that is generally crucial for reward-re- 
lated learning (for reviews, see, e.g., Car- 
dinal and Everitt 2004; Schultz 2013; but 
see also Bromberg-Martin et al. 2010; 
Ilango et al. 2012; Lammel et al. 2012) 
is also important for pleasure-attenuated 
startle. 

Several studies have also demon- 
strated that the nucleus accumbens is 
able to modulate punishment-learning. 
For example, temporary inactivation of 
the nucleus accumbens by local in- 
jections of tetrodotoxin or of the mus- 
carinic agonist carbachol blocks the 
acquisition and expression of punish- 
ment-learning (Schwienbacher et al. 
2004, 2006; Cousens et al. 2011). How- 
ever, manipulating dopamine signaling 
within the nucleus accumbens has no ef- 
fect on punishment-learning (Josselyn 



2001), i.e., the potentiation of startle takes place by direct and in- 
direct projections from the central amygdala to giant neurons 
within the caudal pontine reticular nucleus that in turn activate 
spinal motor neurons (Fendt and Fanselow 1999; Koch 1999). 



et al. 2005; Schwienbacher et al. 2005). 
This is supported by a recent study showing that the GABA-ergic 
but not the dopaminergic neurons in the ventral tegmental area 
(projecting to the nucleus accumbens) are activated by punish- 
ment (Cohen et al. 2012). 
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Relief-learning 

For the establishment of a relief-learning paradigm on the basis of 
the startle response in rats (Andreatta et al. 2012), it seemed sig- 
nificant that relief-learning in flies works best with a relatively 
long gap between electric shock offset and cue onset (5-25 sec) 
(Tanimoto et al. 2004; Yarali et al. 2008). In the vertebrate litera- 
ture this kind of procedure has been called "backward-trace" con- 
ditioning; for the remainder of this contribution, however, we use 
the term relief-learning instead, in order to apply a consistent 
nomenclature across the three species covered. In any case, such 
a relief-learning type of procedure has been employed relatively 
rarely: Often punishment and cue have a coincident onset, but 
the cue outlasts the punishment by a long time (Siegel and 
Domjan 1974; Walasek et al. 1995), or a rather short cue is present- 
ed during or after a rather long aversive stimulus (Heth and 
Rescorla 1973). Also, the question investigated has mostly been 
whether such a relief-learning type of procedure leads to less 
strong learning than punishment-learning. However, there have 
been few studies directly suggesting a positive valence of the 
learned cue after relief-learning types of procedure: Walasek and 
colleagues (1995) used a 1-sec shock and a 1-min cue which had 
simultaneous onsets such that the cue outlasted the punishment 
by 59 sec. In a subsequent test session, the authors observed an in- 
crease in bar pressing for food during the presence of the cue 
(compared with bar pressing under baseline conditions), which 
was interpreted as "conditioned safety" (see Box 1). In contrast, 
punishment-learning (i.e., presentation of the 1-sec shock at the 
end of the 1-min cue) in this paradigm resulted in a strong sup- 
pression of bar pressing. This suggests that also in rats changes 
in the relative timing of cue and punishment do more than affect- 
ing whether and how much learning occurs, but rather can affect 
the valence of the respective mnemonic effect (for a related find- 
ing see also Smith and Buchanan 1954). A different approach was 
used by Navratilova et al. (2012): The authors investigated tonic 
post-surgical pain and induced relief by pharmacological treat- 
ment of that pain. Using a place preference/avoidance apparatus, 
such treatment was paired with one compartment of the appara- 
tus whereas the other compartment was paired with a placebo 
treatment. In a subsequent test session, the animals preferred 
the compartment that had been paired with the pain-relief 
treatment. 

Andreatta et al (2012) decided to use the modulation of 
the startle response as a behavioral measure to compare punish- 
ment- and relief-learning because it can be modulated bivalently 
(Koch 1999); that is, negatively valenced cues, established by pun- 
ishment-learning, increase startle amplitude (Brown et al. 1951), 
whereas positively valenced cues, established by cue-reward asso- 
ciative training, decrease startle amplitude (Schmid et al. 1995). 
The relief-learning protocol was matched closely to the estab- 
lished punishment-learning protocol (i.e., 15 pairings of a 5-sec 
light and a 0. 5-sec, 0.8-mA electric foot shock), and varied only 
the inter-stimulus interval (ISI, defined as the time interval from 
shock onset to light onset) (Fig. 6): Different groups of rats re- 
ceived a relief-learning protocol such that the onset of the electric 
shock preceded the onset of the 5-sec cue (denoted as positive 
ISIs: 3, 6, 12 sec). In addition, groups were included that un- 
derwent punishment-learning with a delay (ISI: -4.5 sec) or a 
trace procedure (ISIs: - 12 sec). Last, but not least, control groups 
received either the cue but no electric shock at all, or cue and elec- 
tric shock at randomized ISIs. In these control groups it was ob- 
served that, relative to the startle-alone testing condition, startle 
is slightly decreased when the cue was present (in Fig. 6, bottom 
right, the stippled line corresponds to the mean across these con- 
trol groups and is referred to as the "baseline" in the following). 
Interpretations of this baseline level might be that it represents 



an unconditioned distraction effect of the light stimulus on star- 
tle, and/or that a mild degree of conditioned safety was induced. 
In any event, as the cue is moved in time toward shock onset, star- 
tle amplitude increases (Fig. 6). As the cue is moved past the shock, 
however, this effect is reversed and startle amplitude decreases. 
Importantly, for even longer gaps between shock and cue, startle 
amplitude returns to baseline levels (Andreatta et al. 2012). Thus, 
with respect to startle modulation as a measure, the cue has ac- 
quired negative conditioned valence with an ISI of -4.5 sec, but 
positive conditioned valence with an ISI of 3 sec; we refer to these 
effects as punishment- and relief-memory, respectively. 

Nucleus accumbens and amygdala are respectively 
required for relief- and punishment-memory 

Given that relief-learning in a startle paradigm can be demon- 
strated in the rat, Andreatta et al (2012) asked what its neuronal 
underpinnings are. A way to probe whether a brain structure 
is acutely required for a particular behavior is by temporarily in- 
activating it. This can be done by optogenetic tools (for the 
rat, see Zalocusky and Deisseroth 2013), or by local microinjec- 
tions of drugs inactivating neuronal firing. A suitable drug for 
these purposes is muscimol, a GABA-A receptor agonist, a method 
previously applied to a number of different brain structures 
and behaviors (e.g., Fendt et al. 2003; Schulz et al. 2004; Miiller 
and Fendt 2006). Notably, these local injections silence neural ac- 
tivity quickly but are remarkably transient (~120 min) (Martin 
1991). 

Given that relief-learning, just like reward-learning, mani- 
fests itself as a decrease in startle in the presence of the learned 
cue, it seemed plausible that brain structures concerned with re- 
ward-learning are required for relief-learning as well. The usual 
first suspect here is the nucleus accumbens, because, like its human 
terminological counterpart, the ventral striatum (which, however, 
includes the olfactory tubercle), it is a critical brain structure for or- 
ganizing learning and behavior in the appetitive domain (Ikemoto 
and Panksepp 1996, 1999; Berridge and Robinson 1998; Cardinal 
and Everitt 2004; Schultz 20 1 3) . Indeed, the decrease in startle am- 
plitude supported by a reward-associated cue (Schmid et al. 1995) 
can be blocked by lesions of the nucleus accumbens (Koch et al. 
1996). 

To test for a role of the nucleus accumbens in relief-memory, 
cannulas aiming at the respective region were chronically and bi- 
laterally implanted. After recovering from surgery, the animals 
underwent the relief-learning procedure, without any injections. 
One day later, muscimol was injected to acutely inactivate the 
nucleus accumbens, and the ability of the learned cue to modulate 
the startle response was tested. It turned out that under such 
accumbal inactivation the learned cue does not support startle 
attenuation beyond baseline (see Fig. 7). In contrast, silenc- 
ing the nucleus accumbens did not prevent startle potentiation 
after punishment-learning. In turn, performance after punish- 
ment-learning was abolished by silencing the lateral/basolateral 
amygdala, a procedure which leaves the performance after re- 
lief-learning unaffected. Thus, there is a double dissociation be- 
tween the requirement of the nucleus accumbens and lateral/ 
basolateral amygdala for memory after relief-learning or after pun- 
ishment-learning, respectively (Andreatta et al. 2012). Notably, 
the "signature" of relief-memory thus corresponds to reward- 
memory. 

It is important to note that the above experiment specifically 
tested for effects on the expression of memory, not for effects on 
the acquisition process. Thus, the reviewed findings raise the 
question of the role of the nucleus accumbens during training, 
i.e., during the acquisition of relief-memory, as well as of the 
task-relevant pathways from the nucleus accumbens onto the 
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startle pathway, potentially including the ventral pallidum and/ 
or the pedunculopontine tegmental nucleus (Koch 1999). A per- 
haps more immediate question was whether similar neuronal 
dissociations of punishment- and relief-memory with regard to 
amygdala and nucleus accumbens are found in humans as well. 



Humans 

Punishment- and reward-learning 

In humans too, punishment-learning is most frequently studied 
in Pavlovian differential conditioning paradigms: For example, a 
geometrical shape such as a square is presented prior to a mild 
electric shock (individually calibrated to be mildly painful), while 
another stimulus, e.g., a triangle, has no such consequence. After 
several trials of training, punishment-memory can be observed on 
physiological, behavioral, and verbal levels (see below). Reward- 
learning is studied less frequently in humans because sufficiently 
strong primary rewards are not easily implemented. What has 
been employed are pleasant odors (e.g., Hermann et al. 2000; 
Gottfried et al. 2002), pleasant tastants (e.g., O'Doherty et al. 
2003), drugs activating primary-reward circuitry (e.g., Winkler 
et al. 2011), money as a secondary reward (e.g., Knutson et al. 
2001; Kirsch et al. 2003), or points worth money as a tertiary re- 
ward (e.g., Kahnt et al. 2012). 

In humans, as in rats, measures of both physiological param- 
eters (e.g., heart rate, skin conductance response) and of overt 
behavior (e.g., startle modulation) are used to assess the effects 
of punishment-learning. Sweating determined as an increase 
in the skin conductance response is the most frequently used 
physiological measure: Punishment-associated stimuli trigger 
enhanced skin conductance responses which increase with the 
aversiveness of the expected punishment (e.g., Wolter and 
Lachnit 1993). Important in the current context, however, is 
that reward-associated stimuli can trigger increases in skin con- 
ductance too (e.g., Amrhein et al. 2004; Winkler et al. 2011). 
Thus, the skin conductance response is a measure of arousal rather 
than of conditioned valence (see Lang et al. 2000). In humans as 
well, to specifically measure conditioned valence after punish- 
ment-learning, modulations of the startle reflex can be used (see 
Lang et al. 1990, 2000; Grillon and Baas 2003; Glotzbach-Schoon 
et al. 2013). Studies of human startle refer to the very rapid whole- 
body syndrome of postural changes (Fig. 8) as well as the closure 
of the eyes and accompanying changes in facial expression first de- 
scribed upon shooting a pistol (Strauss 1929; Landis and Hunt 
1939) (these early works use the German word "Zusammen- 
schrecken"). Indeed, Woodworth and Schlossberg (1956) sug- 
gested that "the most [. . .] convenient stimulus seems to be 
shooting a 22-caliber blank cartridge." Contemporary experimen- 
tal settings instead prefer a short white noise with a sudden onset 
(e.g., 95 - 105 dB, 50 msec) presented via headphones. It was recog- 
nized early on (Landis and Hunt 1939) that the most reliable way to 
quantify startle is to focus on the eye closure component by means 
of the electromyogram of the musculus orbicularis oculi (Fig. 
8; Blumenthal et al. 2005). Alternative ways of eliciting startle 
responses in humans are via cutaneous (i.e., mechanical stimu- 
lation to the forehead by a tap, air-puff, or mild electric shock) or 
visual stimulation (Blumenthal et al. 2005), while alternatives 
to measuring eye-blink by electromyograms are to use photoelec- 
tric or potentiometric measures (Berg and Balaban 1999; Dawson 
et al. 1999). In addition, startle entails postural (see above), auto- 
nomic (e.g., Hamm et al. 2003a), and electrocortical (e.g., Schupp 
et al. 1997) responses serving to make the subject ready for an im- 
minent fight-or-flight situation, as well as modulations of the post- 
auricular reflex (e.g., Benning et al. 2004; Franklin et al. 2013a) 
(note that the latter can reveal positive valence by increased ampli- 



tude). In humans, as in rats, the amplitude of the startle response is 
increased in the presence of a shock-associated or an aversive stim- 
ulus (Lang et al. 1990; Grillon and Baas 2003; Andreatta et al. 2010; 
Norrholm et al. 2011). In turn and important in the current con- 
text, numerous studies have revealed that startle is decreased by 
stimuli associated with reward or pleasure (e.g., Geier et al. 2000; 
Skolnick and Davidson 2002; Conzelmann et al. 2009), although 
there is a lack of pure reward-conditioning studies with the startle 
response as a dependent measure. The only such study we are 
aware of used a visual cue and a monetary reward in a reaction 
time task but failed to find an effect of the trained cue on startle re- 
sponses (Lipp et al. 1994). In any event, in humans too the startle 
response is a bivalent measure: Its potentiation or attenuation can 
indicate negative or positive conditioned valence, respectively. 

In addition to what is possible in rats (and flies, of course), 
humans can be assessed for their explicit, verbal valence of the 
conditioned stimulus. This is typically done by asking partici- 
pants to rate their feeling toward the conditioned stimulus on 
a scale from "emotionally negative" to "emotionally positive." 
After conditioning, compared to before training, humans normal- 
ly rate punishment-associated stimuli as more "emotionally neg- 
ative" (e.g., Andreatta et al. 2010) and reward-associated stimuli as 
more "emotionally positive" (Cox et al. 2005) . Obviously, such ex- 
plicit measures of valence are possible only in humans as they are 
influenced by aspects of human nature that we do not conven- 
tionally need to reckon with in rats or flies, such as contingency 
awareness and reflectivity (see Hofmann et al. 2010). These top- 
down influences and their interaction with bottom-up processes 
are captured by dual process theories (e.g., Strack and Deutsch 
2004) proposing that human behavior is the result of the integrat- 
ed operation of impulsive and reflective processing. The reflective 
system organizes behavior intentionally, that is via decision pro- 
cesses based on knowledge about the value and the probability 
of behavioral consequences, while the impulsive system operates 
largely independently of a person's intention or goal. Even so, the 
impulsive system is thought to generate experiential corollaries, 
yet these are much less nuanced as compared to the reflective sys- 
tem. Normally, impulsive and reflective processes are well inte- 
grated and concordant and jointly organize behavior: Within 
dual-process theories, the increase or decrease in startle amplitude 
after punishment- or reward-learning reflects impulsive valua- 
tion, while the ratings of the conditioned stimuli track a concor- 
dant reflective process. However, as the discussion below will 
show, under some conditions one has to reckon with discordance 
among these systems. 

The neuronal substrates of punishment-learning and startle 
potentiation are strikingly similar in man, monkey, and rat, argu- 
ably because the underlying circuitry and its modulation are evo- 
lutionarily ancient and conserved (see Lang et al. 2000; Davis 
et al. 2008, 2009). Functional magnetic resonance imaging has 
confirmed amygdala activity triggered by punishment-associated 
stimuli as compared to neutral stimuli (e.g., Biichel et al. 1998; 
LaBar et al. 1998), even if subjects were not aware of stimulus pre- 
sentation due to masking (Morris et al. 1998). In addition, insula 
and anterior cingulate cortex were found by most studies to be in- 
volved in punishment-learning (for review, see Sehlmeyer et al. 
2009). The necessity of the amygdala for punishment-learning 
in humans is also confirmed by studies examining patients with 
specific lesions. The seminal study of Bechara et al. (1995) revealed 
a double dissociation by showing that a patient with bilateral 
amygdala lesions did not acquire conditioned autonomic fear 
responses but had declarative knowledge of contingencies, while 
a patient with bilateral hippocampus lesions failed to learn con- 
tingencies but did acquire conditioning. Further studies with 
amygdala-lesioned patients revealed that these patients do not 
show startle potentiation either in response to punishment itself 
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(Angrilli et al. 1996) or to punishment-associated stimuli, even if 
they acquire declarative knowledge of contingencies (Weike et al. 
2005). In turn, punishment-associated stimuli can cause an in- 
crease in startle in subjects with intact amygdalae yet unaware 
of stimulus presentation due to cortical blindness (Hamm et al. 
2003b). Thus, startle potentiation, an implicit measure of condi- 
tioned valence after punishment-learning, engages and requires 
the amygdala, and can take place without contingency awareness 
and without concordant explicit valuation. 

The neuronal substrates of human reward-learning studied 
by functional magnetic resonance imaging include the ventral 
striatum (which in humans includes the nucleus accumbens) 
and the orbitofrontal cortex: Gottfried et al. (2002) paired three 
neutral faces with pleasant, neutral, or unpleasant odors and 
found that appetitive olfactory learning led to activity in the me- 
dial orbitofrontal cortex and nucleus accumbens as well as in 
the amygdala. O'Doherty et al. (2002) examined brain activity 
elicited by visual stimuli previously paired with either a pleasant 
sweet taste, a moderately aversive low- 
salt taste, or no taste. When anticipating 
reward, activity in the ventral striatum 
and orbitofrontal cortex was found (for 
a related finding using monetary reward, 
see Kirsch et al. 2003); in addition, activi- 
ty in the amygdala was reported under 
such conditions. Of these activations, 
only orbitofrontal cortex activation, ar- 
guably in another region, was seen upon 
the actual delivery of reward (for a report 
confirming this double role of the orbito- 
frontal cortex, using a monetary reward, 
see Cox et al. 2005). Perhaps more im- 
portant in the current context, the only 
regions activated in accordance with 
an appetitive prediction error signal, 
that is the difference between expected ' 
and received reward, are the ventral stria- 
tum and orbitofrontal cortex (O'Doherty 
et al. 2003). Thus, although studies on the 
necessity of these regions in human 
primary-reward learning are lacking (but 
see, e.g., Bechara et al. 1999; Tsuchida 
et al. 2010) and although the boundary 
conditions for the engagement of the 
amygdala in particular still need to be bet- 
ter understood, the underlying circuitry 
in man may well turn out to be very 
similar to that in the rat in the case of 
reward-learning too. | 



the shocks (punishment-learning), or it was presented upon the 
offset of the shocks (relief-learning). Afterward, the implicit and 
explicit valences of the stimuli were measured by their capacity 
to modulate the startle response and by ratings, respectively. It 
turned out (Fig. 8) that if associated with shock onset, the visual 
stimulus subsequently potentiated startle, indicating negative 
conditioned valence; in turn, if associated with pain offset, the vi- 
sual stimulus subsequently attenuated startle, indicating positive 
conditioned valence. 

Interestingly, although in Andreatta et al. (2010) the subjects 
showed potentiated startle after punishment-learning (negative 
valence) and attenuated startle after relief-learning (positive va- 
lence), and although these implicit measures of conditioned 
valence match the findings in rat and fly, it then turned out 
that, irrespective of whether the visual stimulus had been used 
in punishment- or in relief-learning, the subjects explicitly rated 
them as "emotionally negative" (but see Zanna et al. 1970). 
Such a discordance between implicit and explicit conditioned 
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In order to fully appreciate the mnemon- 
ic consequences of an adverse event such 
as an electric shock, Andreatta et al. 
(2012, 2013) decided to test for relief- 
learning in humans as well, using a para- 
digm as closely related to the studies in 
the rat as possible. Two groups of subjects 
underwent discrimination learning of 
visual geometrical shapes, one of which 
became associated with moderately pain- 
ful electric shocks. What differed be- 
tween the groups was the relative timing 
of the visual stimuli and the shocks 
(Fig. 8): Either one visual shape preceded 
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valence led to the question whether in humans the cortical "foot- 
print" of the learned visual stimulus would be punishment- 
like or reward-like, and would thus correspond to explicit or 
implicit valence judgments. It was found (Fig. 8) that after punish- 
ment-learning the learned stimulus activated the amygdala as 
part of a conditioned fear network; in contrast, after relief-learn- 
ing it activated the striatum, i.e., part of a reward network 
(Andreatta et al. 2012). Thus, in the case of punishment-learning 
there was a concordance between neural activation, implicit va- 
lence, and explicit valence of the learned stimulus. In contrast, af- 
ter relief-learning the negative explicit valence of the learned 
stimulus appeared discordant with its reward-like neuronal signa- 
ture and its positive implicit valence. Notably, after both punish- 
ment- and relief-learning an activation of the insula was found 
(Andreatta et al. 2012). Given that humans explicitly judged 
both punishment- and relief-conditioned cues as emotionally 
negative in that study, it is tempting to speculate that the insula 
is involved in the generation of emotional valence experience 
(see also Craig 2009) and/or in the experience of excitement inde- 
pendent of valence (Elliott et al. 2000) . Recently, a follow-up study 
(Andreatta et al. 2013) suggested that both the nature of the task 
as a between- versus a within-subject task and the predictability of 
the shock modify explicit valence in relief-learning: In a newly 
introduced within-subjects design, both implicit and explicit va- 
lence measures after relief-learning were positive if shock occur- 
rence was predictable. These findings call for a full clarification 
of the experimental boundary conditions, causes, and psychiatric 
implications of the observed dissociation between explicit and 
implicit valence of stimuli associated with relief. 

We note that previous studies, although related, used ap- 
proaches critically different from the relief-learning approach 
which is the focus of the present review (see also the discussions 
in Riebe et al. 2012; Kong et al. 2013): Seymour et al. (2005) exam- 
ined brain responses triggered by stimuli predicting a reduction of 
tonic pain, Leknes et al. (2011) used a procedure involving the 



Figure 8. (A) Body posture upon a pistol shot illustrating the startle response pattern (from Landis 
and Hunt 1 939) initially described as "Zusammenschrecken" by Strauss (1 929). Together with the im- 
mediate, brief concomitant closure of the eyes, startle serves to protect from imminent threat and to 
prepare the subject for fight-or-flight behavior. (B) Magnetic resonance image of a human brain, 
coronal slice. (C) Experimental setup for relief- and punishment-learning in humans. The subject 
faces a computer monitor (not shown) on which visual stimuli can be displayed. During training 
mild electric shock punishment can be delivered to the left forearm. During the test, a loud noise 
can be delivered through the earphones to induce the startle response while the subjects watch the 
screen on which visual stimuli are displayed. Measurement electrodes (blue) record the eye-closure 
component of the startle response via electromyography (EMG) from the musculus orbicularis oculi. 
(D) Sketch of the sequence of events for relief- or punishment-learning, using a between-group 
design. During training, in both groups a control stimulus (e.g., a triangle) is presented temporally 
removed from shock. In the relief-learning group, a second geometrical stimulus (e.g., a circle) is pre- 
sented upon cessation of shock (ISI = 6 sec). During the test the startle amplitude, evoked by a loud 
noise from the earphones and measured by the eye-EMC, is less when subjects are viewing the relief- 
trained than when viewing the control stimulus. In contrast, in the punishment-learning group the 
second stimulus (e.g., a square) is presented before shock (ISI = -8 sec). During the test the startle am- 
plitude is higher when subjects are viewing the punishment-trained than when viewing the control 
stimulus. Note that the experimental role of the geometrical shapes is counterbalanced across subjects. 
(£) Experimental data showing relief- or punishment-memory, dependent on the relative timing of the 
visual stimulus and shock during training. As in Figure 6, positive conditioned valence ("Good") is 
plotted toward the top of the y-axis indicating the degree of startle attenuation; in turn, startle poten- 
tiation is plotted toward the bottom of the y-axis in order to display negative conditioned valence 
("Bad") toward the bottom. The sign of the startle modulation is presented as, respectively, negative 
or positive, because the actual behavior of the subjects consists of less or more startle, respectively. 
Box plots show the median as the bold middle line, and the 25%/7S% and 10%/90% quantiles as 
box boundaries and whiskers, respectively. Data taken from Andreatta et al. 2010, sample sizes are 
N=34 and N=33 for the punishment- and the relief-learning groups, respectively. (F) After 
punishment-learning, the learned visual stimulus induces activation of the right amygdala (left 
panel), but not of the striatum. In contrast, a relief-conditioned visual stimulus induces activation of 
the right striatum (right panel), but not of the amygdala; striatum activation extends to the ventral stri- 
atum/nucleus accumbens. Both the punishment- and the relief-conditioned stimulus induce activation 
of the left insula as well (not shown). From Andreatta et al. (201 2), sample sizes are N = 14 for both the 
punishment- and the relief-learning groups. 



omission of a painful stimulus, and Kim et al. (2006) as well as 
Levita et al. (2012) studied brain responses during instrumental 
avoidance of an aversive outcome. Interestingly, reward centers 
of the brain, i.e., ventral striatum and orbitofrontal cortex, seem 
to be implicated in all these cases (see also the discussion of the 
role of these activations in active avoidance learning in Ilango 
et al. 2012). To study the unconditioned implicit valence of 
pain offset itself, Franklin et al. (2013a) presented the startle noise 
immediately after a painful stimulus and measured both the eye- 
blink and the post-auricular component of the startle response. 
Startle eye blink was reduced and startle post-auricular reactivity 
enhanced, arguing not only for a reduction in negative affect 
but for a genuinely positive affective component several seconds 
after pain offset. Lastly, Fujiwara et al. (2009) implemented relief 
in a very different, arguably higher-order sense: Subjects were in- 
formed post hoc whether their actual choice had yielded them a 
better ("relief") or worse ("regret") monetary feedback than if 
they had made an alternative choice. Under thus-defined relief 
conditions, activity in anterior ventrolateral prefrontal cortex 
was observed; notably, this held true for relief cases implemented 
by means of both more gain and less loss. 

In any event, on the basis of the reviewed findings, we pro- 
pose as a working hypothesis that pain offset activates at least 
parts of the accumbal reward circuitry and that these activations 
induce relief-learning. The learned cue is thus endowed with the 
capacity to trigger conditioned relief, which is entangled with 
the activity of much of that same circuitry. Consistent with this 
working hypothesis, firing of dopaminergic neurons as well as 
the hemodynamic activity of their target regions, such as the ven- 
tral striatum, can be increased not only by reward (for review, see 
Schultz 2013) but also by the reduction (Seymour et al. 2005) or 
upon the termination (Becerra and Borsook 2008; Brischoux 
et al. 2009; Fiorillo et al. 2013a,b) of an adverse event. Such off- 
set-activation is particularly obvious when the end of the adverse 
event is indeed clearly defined in time, as is the case for air puff but 
arguably not for taste cues such as salty 
and in particular bitter (Fiorillo et al. 
2013b, loc. cit. Fig. 2A). Whether the 
omission of punishment activates these 
neurons as well is a matter of controversy 
(e.g., Leknes et al. 2011; Fiorillo 2013; 
Fiorillo et al. 2013a,b). 

Our working hypothesis must not 
be construed as equating relief with re- 
ward (indeed in flies at least considerable 
discrepancies are observed: see section 
"Fly"); rather, we argue that some of the 
neuronal footprint is shared between re- 
lief and reward, prompting the question 
how far these commonalities extend, 
both in the neuronal and in the psycho- 
logical domain — and how the differences 
between relief and reward are neuronally 
and psychologically structured. 



Relevance for defensive behavior: 
extending the threat-imminence 
model to post-strike 

Faculties such as punishment-learning 
have evolved because of their benefits 
to survival, but of course there are few 
electric shocks in nature — except per- 
haps lightning or the defense systems 
of electric fish. Of greater concern are, for 
example, predators, parasites, "armed" 



www.learnmem.org 



246 



Learning & Memory 



Pain-relief learning 



insects, such as mosquitos, ants, or bees, competitors, physical 
events such as forest fires, as well as irritant, toxic, or, in the 
case of flies, even predatory plants. In each case, it is adaptive to 
learn the cues predictive of the threat in order to gain time for de- 
ciding on the most appropriate action and/or bodily preparing for 
it. Such coping behavior includes risk assessment, avoidance/ 
flight, and approach/fight. Indeed, according to the threat- 
imminence model (Fanselow 1994), defensive behavior in rats — 
and arguably in humans as well (see, e.g., Low et al. 2008) — is or- 
ganized in three stages, depending on the temporal and spatial 
proximity of threat (see also Blanchard and Blanchard 1989): 
pre-encounter, post-encounter, and strike phase. If the sub- 
ject registers an increased likelihood of threat in the pre- 
encounter phase, this results in increased general alertness, undi- 
rected search behavior, orienting, and risk assessment; this phase 
in humans is entangled with feelings of anxiety. The post- 
encounter phase is characterized by freezing, reflex potentiation 
including startle, and selective attention to the encountered 
threat (interestingly, the silence that ensues upon the cessation 
of movement during freezing may, in turn, be a signal for threat 
to conspecifics [Pereira et al. 2012], in particular in combination 
with ultrasonic distress vocalizations emitted into this silence 
[Wohr and Schwarting 2013]). The strike phase is associated 
with directed fight/flight behavior; in humans both the post- 
encounter and strike phase are characterized by the emotion of 
fear. While it is unknown which brain structures mediate the 
pre-encounter phase, the amygdala and the ventral periaqueduc- 
tal gray are critical for the post-encounter phase, and the dorsal 
periaqueductal gray as well as the superior colliculus are impor- 
tant for the strike period. On the basis of the reviewed data, this 
three-stage model should be extended (see also the insightful ac- 
count of behavior organization put forth by Craig 1918) to in- 
clude a post-strike phase characterized by the emotion of relief, 
behaviorally expressed as startle attenuation and mediated by 
the nucleus accumbens. The positive valence of this phase is bio- 
logically reasonable because it can reinforce those behaviors 
that helped to escape or otherwise master and survive the threat 
(Smith and Buchanan 1954; Ilango et al. 2012) and the learning 
of the cues associated with its disappearance: Subsequently, 
learned approach behavior to such cues would decrease the prob- 
ability of encountering the threat again. Such an extension of the 
threat-imminence model should provide an integrated, conceptu- 
ally, and biologically meaningful framework for both punish- 
ment- and relief-learning in rats and humans. Furthermore, it 
should prompt Drosophila researchers to characterize the defen- 
sive behaviors of flies toward threats as well as toward shock- 
predicting stimuli in more detail than has been customary to 
date (for a pioneering analysis, see Chabaud et al. 2010). This 
should reveal whether invertebrate defensive behavior follows 
the same general principles. Indeed, using terminology that 
may appear old fashioned today, the early neuroethologists 
around Craig, von Hoist, Tinbergen, von Frisch, and Lorenz 
were applying a hierarchical and sequential framework of behav- 
ior organization featuring the steps of (1) need-related, spatially 
undirected search behavior and attention, (2) spatially directed 
orientation behavior and attention, and (3) eventual action and 
the embodiment of consequences (Craig 1918; Lorenz and 
Tinbergen 1938; Bullock and Horridge 1965; Lorenz 1973). 

Perspectives for applied psychology 

The reviewed data suggest that considering relief-learning can also 
help us to understand some features of odd and/or pathological 
behavior. For example, "having survived" a roller-coaster ride or 
extreme sports may support relief-learning by activating reward 
circuitry, reinforce these activities, give the situation an implicit 



positive valence, and thus explain an otherwise irrational ten- 
dency to perform such dangerous activities again (Solomon and 
Corbit 1974). Likewise, self-cutting and related nonsuicidal self- 
injury may be operant behaviors carried out in order to access relief 
(Franklin et al. 2013a,b). Moreover, the relief experienced by a 
hostage following acute repeated death threats may help explain 
the development of positive feelings toward the hostage taker 
("Stockholm syndrome"). Relief-learning may also contribute 
to the attraction of anxiety patients to cues associated with fear off- 
set such as medication, or the hospital or therapy environment, 
and may explain why items not plausibly causally related to but 
temporally linked with fear offset can become fear-fighting talis- 
mans. Similar scenarios may apply regarding the offset of traumat- 
ic events or of panic attacks and the safety-seeking behavior that 
may emerge . Further, hypervigilance to signals of upcoming threat 
(Bishop 2007) together with a deficit in the perception of (or defi- 
cits in learning about) offset signals may turn a bad experience into 
an unbearable one (see also Box 2), and thus contribute to the sus- 
ceptibility for post-traumatic stress or panic disorder (Mineka and 
Oehlberg 2008; Jovanovic et al. 2012). For example, patients suf- 
fering from post-traumatic stress disorder not only express en- 
hanced fear-learning but also a reduced response to safety signals 
(Jovanovic et al. 2012). Furthermore, successful safety-seeking 
behavior may contribute to the maintenance of anxiety disorders 
such as panic disorders or agoraphobia (Lohr et al. 2007). We note 
that in flies at least (and preliminary data suggest that it is similar in 
rats) memories after relief-learning are temporally more stable 
than memories after punishment-learning relatively early (up to 
2 h) after conditioning (Yarali et al. 2008; Diegelmann et al. 
2013b). If this were the case in humans, too, deficiencies in 
relief-learning might render subjects more susceptible in particu- 
lar to repeated adverse experiences occurring in relatively quick 
succession, such that the net effect of a sequence of grave experi- 
ences is turned into an unbearable, traumatic one. 

To summarize, relief-learning can be observed in species as 
distantly related as flies, rats, and man. This draws attention to 
the mnemonic processes related to the offset of reinforcers; in- 
deed, these processes may prove to be no less universally con- 
served than those related to reinforcement onset (Rescorla 1988) 
and thus lend themselves to translational research. Specifically, 
we believe that the full range of the behavioral and psychologi- 
cal consequences of painful and/or traumatic experiences cannot 
be appreciated without taking relief-learning into account. As 
we argue, relief-learning should be considered in particular 
when trying to understand the development and maintenance 
of both adaptive avoidance behavior and of pathological condi- 
tions such as exaggerated risk-taking, self-injury, excessive safety- 
seeking, post-traumatic stress disorder as well as panic disorders. 
In all these cases, attention in particular seems warranted as to 
whether or not modifications of these behaviors or the considered 
therapy are differentially effective for fear- and relief-memories. 
To this end, basic and translational research into the cellular, mo- 
lecular, and genetic bases for punishment- and relief-learning is 
both useful and fascinating. 
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